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Abstract 

We study the problem of generating connected random graphs with 
no self-loops or multiple edges and that, in addition, have a given de¬ 
gree sequence. The generation method we focus on is the edge-switching 
Markov-chain method, whose functioning depends on a parameter w re¬ 
lated to the method’s core operation of an edge switch. We analyze two 
existing heuristics for adjusting w during the generation of a graph and 
show that they result in a Markov chain whose stationary distribution 
is uniform, thus ensuring that generation occurs uniformly at random. 
We also introduce a novel w-adjusting heuristic which, even though it 
does not always lead to a Markov chain, is still guaranteed to converge to 
the uniform distribution under relatively mild conditions. We report on 
extensive computer experiments comparing the three heuristics’ perfor¬ 
mance at generating random graphs whose node degrees are distributed 
as power laws. 

Keywords: Random-graph generation. Edge switch, Markov chain. 


1 Introduction 

Let D = {di,d2, ■ ■ ■ ,dn} be a set of nonnegative integers such that di > ^2 > 
• • ■ > dn and let be the set of all connected graphs on n nodes that have 
no self-loops or multiple edges and for which D is the degree sequence. That 
is, the degree of node uj, 1 < j < n, is dj. We know from HU El that is a 
nonempty set, in which case we say that D is realizable, if and only if all the 
following conditions hold: 
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• ^3 i® even. 

• T,]=idj > 2(n- 1). 

• J2’j=i dj < k{k — 1) + J2^=k+i min{fc, dj} for all k such that 1 < k < n. 

We consider in this paper the problem of generating graphs of Qd uniformly at 
random when D is realizable. 

In the absence of the connectivity constraint, the problem of generating ran¬ 
dom graphs for a given degree sequence is closely related to some other problems, 
like generating a (0,1) matrix with given marginals uni, approximating the per¬ 
manent of a matrix m, and sampling a perfect matching or an /-factor of a 
graph call]. However, it remains generally unknown how to generate graphs 
uniformly at random for a given degree sequence within reasonable time bounds 
m, even though exceptions exist for some special cases, like regular EH and 
bipartite graphs caiini. 

The problem of generating random graphs has recently acquired considerable 
prominence from a practical perspective. Since many real-world networks, like 
the Internet, the WWW, social networks, and scientific-collaboration networks, 
all typically have a very large number of nodes and evolved over time in such an 
unorganized way that only limited information is known about their topologies 
01211 , many studies of their properties have been conducted within a random- 
graph framework I2iini. In addition, these networks are now known to differ 
sharply from the classical random-graph model introduced by Erdos and Renyi 
[12 Ej, in which the node-degree distribution is the Poisson distribution. Some 
empirical studies suggest that many of them have node-degree distributions that 
seem to conform to a power law [ 12100122 , that is, the probability that a 
randomly chosen node has degree a is proportional to a~'^ for some r > 1. 

Clearly, any method for sampling a graph uniformly at random from Qo for 
a given D can be easily extended to generate random graphs having a power- 
law node-degree distribution. We first obtain D by sampling each dj from the 
power-law distribution. If D turns out not to be realizable, then we discard it 
entirely and obtain a new one, repeating this process while needed. We then 
select the desired graph uniformly at random from Q^. 

Other, more complex methods for generating random graphs having node 
degrees distributed as a power law have been proposed. In these methods, gen¬ 
eration is achieved by successively adding nodes and edges to the graph in such 
a way that tries to follow some principles, like preferential attachment, that are 
believed to have guided the evolution of some real-world networks [220 How¬ 
ever, simply generating a graph having a given degree sequence sampled from 
the power-law distribution has been observed to perform satisfactorily with 
regard to certain measures m Moreover, this approach can be used to ob¬ 
tain random graphs having any node-degree distribution, which is an important 
flexibility since correctly determining the node-degree distribution of real-world 
networks has remained essentially an open problem 0. 

Given a realizable H, we consider the generation method that we call the 
edge-switching Markov-chain (ESMC) method for choosing graphs from Qd uni- 


2 


formly at random, also variously known by other denominations |lti|. This 
method, which can be modeled as a Markov chain and whose details are more 
thoroughly described in Section [3 employs an operation that we call an edge 
switch to transform a graph oi Qd into another graph, maybe not in Qd by 
virtue of not being connected, that has the same degree sequence D. Let G 
be the graph being generated. To avoid generating unconnected graphs, we 
periodically perform a connectivity test on G. If G is unconnected, we undo 
all the edge switches performed since the previous connectivity test. Basically, 
the method consists of first obtaining a graph G from Qd deterministically and 
then applying a series of edge switches and connectivity tests to G until a cer¬ 
tain halting condition is satisfied. We also discuss in Section |3 a methodology 
for obtaining the halting condition, which ultimately also embodies a criterion 
for estimating how close G is to an uniformly random sample from Qd- 

The ESMC method is intrinsically based on an integer parameter w > 1 
giving the number of edge switches to be attempted between successive con¬ 
nectivity tests. Naturally, setting w appropriately is crucial to the performance 
of the method. When w is too small, a large number of connectivity tests is 
performed, which dramatically increases the running time of the method, as 
the time complexity of a connectivity test is high in comparison to the time 
complexity of an edge switch. On the other hand, when w is excessively large 
the probability that the connectivity test is performed on an unconnected graph 
tends to be high, possibly causing many edge switches to be undone. Obtaining 
an ideal value for w beforehand seems to be an elusive goal, so heuristics have 
been proposed for adjusting w along the algorithm’s execution [T?n 1^ . We 
discuss the existing heuristics, and also introduce a new one, in Section |3 
We present in Section 0] the results of extensive experiments for degree se¬ 
quences sampled from power-law distributions. We evaluate the three heuristics 
described in Section |3 along with two different halting conditions. In general, 
our experimental results indicate that, on average, our heuristic outperforms 
the two existing heuristics in terms of the total running time by a margin of 
12% to 86%. We conclude in Section 0 

2 The ESMC method 

We henceforth denote by G the graph being generated, that is, the graph on 
which the edge switches and the connectivity tests are performed. An edge 
switch is performed on a pair of nonadjacent edges (i.e., edges that share no 
nodes) and consists of removing them from G and adding back one of two other 
pairs of edges. The pair of edges to be added to G is chosen at random from 
these two and the edge switch is only carried through if neither edge of the 
chosen pair already exists in G. For example, let {uj,Uk) and {ux,Uy) be two 
nonadjacent edges of G. The edge-switching operation on {uj,Uk) and {ux,Uy) 
consists of removing these edges from G and adding to G either (uj,Ux) and 
(uk,Uy) or {uj,Uy) and {uk,Ux)- Although node degrees are clearly seen to 
remain unchanged by an edge switch, G may become an unconnected graph. 
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Figure 1: The two possible edge switches on the edges (uj,Uk) and {ux,Uy) (a) 
and a scenario in which only one of them can be carried through (b). 


Figure^a) illustrates the two possible edges switches on the edges {uj,Uk) and 
(ux,Uy). Figure n^b) illustrates a situation in which only one edge switch can 
be carried through on those edges. 

The ESMC method is best described on a Markov chain M. having one state 
associated with each graph of Qd- If Gi,G 2 , ■ ■ ■, G\gj^\ are the graphs in Qjj 
and Xi,X 2 , ■ ■ ■ are the states of Ad, then we let, for 1 < i < \Gd\, Xi be 

the state in which G = Gi. In essence, the ESMC method consists of initially 
obtaining a graph oi Gd and then performing a sequence of transitions on Ad 
from the corresponding state until a certain halting condition is satisfied. 

In order to obtain the initial graph, we employ the Havel-Hakimi algorithm 
HHnniiTi, which successively adds edges to an initial graph G having n isolated 
nodes. For 1 < j < n, along the process let the residual degree rj of uj be the 
difference between dj and the number of edges already incident to uj; clearly, 
Tj = dj initially. The algorithm repeatedly selects the node, say Uk, having the 
highest residual degree and connects it to the Vk nodes having the next highest 
residual degrees, which leads to = 0 and also to smaller values of the other 
nodes’ residual degrees. The repetition goes on until rj = 0 for all j such that 
^ j n. At this moment, G has degree sequence D but may be unconnected. 
Since D is realizable, G must contain a cycle if it is not connected. If we take 
an edge of this cycle and an edge of another connected component, and perform 
an edge switch on them, then necessarily two of the connected components of 
G are merged together into a single one. This process can be repeated until G 
becomes connected. 

Let us then describe what constitutes a transition in Ad. Let w > 1 be an 
integer parameter. A transition in Ad is a sequence of w steps that we call 
edge-switching attempts. In each edge-switching attempt, we randomly select 
two distinct edges of G. If they are not adjacent, then we randomly choose one 
of the two possible edge switches. If the chosen edge switch is feasible, that 
is, it does not involve adding an edge that already exists in G, then we go on 
and perform the edge switch. G is kept unchanged otherwise. After w edge¬ 
switching attempts, we perform a connectivity test on G. If G turns out to be 
unconnected, then we undo all the edge switches performed during the previous 
w edge-switching attempts. 
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Now let m be the number of edges of G. If we use an array with the edges 
of the graph, an adjacency matrix, and an appropriate collection of incidence 
lists and pointers, then an edge switch can be done in 0(1) time while requiring 
0(n^) space, which for large n is prohibitive. An alternative way is to use an 
array with the edges of the graph and an appropriate collection of incidence 
trees and pointers, which leads to 0(log(ii) time and 0(m) space instead. In 
any case, and considering that the connectivity test can be performed in 0{m) 
time, setting w properly is essential to the achievement of good performance. 
We return to this issue in Section 0 

In Ad, a transition exists from Xi to Xj, 1 < i,j < \Qd\, if and only if there 
is a sequence of w edge-switching attempts transforming Gi into Gj. Let ptj 
be the probability associated with this transition. Clearly, pij = pj^i so long as 
w is constant (every edge switch can be undone with the same probability with 
which it was previously done), and pi^i > 0 (every edge-switching attempt may 
select adjacent edges to switch or an infeasible edge switch). The main results 
that pertain to the use of A4 in sampling a random graph from Qu uniformly 
at random are consequences of the following two classic theorems on Markov 
chains EniE2|. 

Theorem 1. A finite, irreducible, and aperiodic Markov chain converges to an 
unique stationary distribution regardless of the initial state. 

Theorem 2. Given a finite, irreducible, and aperiodic Markov chain with state 
space {Yi,Y 2 , ■. ■ ,Yk}, let Qij be the probability associated with the transition 
from Yi to Yj. If there are nonnegative numbers tti, 712 ,..., tt*, such that — 

I, and furthermore 

for all i,j such that I < i, j < k, then the stationary distribution of this Markov 
chain is given &?/ tti, 772 ,..., with the probability associated with Yi being 
I < i < fc. 


Corollary 13 given next, follows directly from Theorem [3 

Corollary 3. If Qij = qj,i for all i,j such that 1 < < fc, then the stationary 

distribution of the Markov chain of Theorem\^is the uniform distribution. 

Our chain Ai is certainly finite and is also irreducible (since there is a se¬ 
quence of transitions between any two states of Ai |2S]) and aperiodic (since 
Pi^i > 0 for all i such that 1 < i < \Gd\). Also, pij = pj^i for all i,j such that 
1 < j < \Gd\ if w is constant. By Corollary |3 we then have the following. 

Corollary 4. If w is constant, then AA converges to the uniform distribution 
regardless of the initial state. 


We finalize the section by discussing a halting condition for the ESMC 
method. For t > 1, let g{t) be a function of G right after the tth transition. Let 
also 


9{t) 


g(o) + g(i) + • • • + g{t) 

t + 1 


( 1 ) 
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where g(0) refers to the initial G. The quantity in is known to give an 
unbiased estimator of the expected value of g(t) under the stationary distribu¬ 
tion whenever Theorem Q] holds 1201 . What we do is to use g(t) as an indirect 
indicator of the convergence of A4. Let 5 > 1 and 7 > 0 be two parameters, 
the former an integer. Our halting condition after the tth transition is that the 
inequality 


g{z) -g(t-5) 

git - S) 


< 7 


( 2 ) 


hold right after each of the S most recent transitions that precede (with inclu¬ 
sion) the tth one (that is, for t — 5 -I- 1 < 2 : < t). The efficacy of this halting 
condition depends clearly on the function g{t). In Section 0] we present compu¬ 
tational results for two different choices of g{t). 


3 Heuristics for parameter adjustment 

As we remarked at the end of Section^ adjusting w along the evolution of M is 
a viable alternative, aiming at better convergence properties, to fixing its value 
at the onset. In this section we discuss some heuristics to do this. Each transi¬ 
tion consists now of performing w edge-switching attempts, a connectivity test 
(with the ensuing possible undoing of all the edge switches performed during 
the w attempts), and moreover an update of the value of w. We consider two 
approaches to adjusting w. The hrst consists of a mechanism that is used in 
all existing heuristics for adjusting w in accordance with the result of the pre¬ 
vious connectivity test. The other one is a new heuristic that adjusts w aiming 
at approximating a given probability for the success of the next connectivity 
test. Notice that, in either case, Corollary 0] is no longer applicable and the 
convergence of M has to be re-examined. 

3.1 Two current heuristics 

Let us begin with the first approach. We start with ic = 1 and increase the 
value of w whenever the connectivity test succeeds; we decrease it otherwise. 
As we demonstrate next, a Markov chain exists associated with this approach 
that has a uniform stationary distribution. 

Let Ai' be a Markov chain whose states are each associated with a graph 
of Gd and a value of w. We denote by the state of M' associated with 
Gi and w = a. While Ai' models the approach in question faithfully, it has 
more than one state associated with each graph of and using it directly in 
our analysis may prove cumbersome. We then introduce another Markov chain, 
denoted by Ai" and having only \Gd\ states, each associated with a graph of 
Gd- We denote by X" the state of Ai" associated with Gi. This state is the 
union of for all a > 1, i.e., X" results from clustering together all the states 
of Ai' that correspond to Gi. 

In order to make the state space of Ai' finite, we limit the value of w by 
a fixed upper bound, henceforth denoted by W. This strategy not only makes 
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A4' a finite Markov chain, which is crucial to the analysis that follows, but 
also avoids excessively large w values, which may jeopardize the approach’s 
efficacy, especially in relation to the halting condition, as g(t) may end up being 
calculated too sporadically with respect to the edge-switching attempts. 

It is a consequence of our discussion of Section |21 that, in A4', any state 
X'is reachable from any state without even going through states for 
which w ^ a. As any of the involved transitions corresponds unequivocally to a 
transition in Xi", it follows immediately that Xi" is irreducible and aperiodic. 
By Theorem^ X4" converges to an unique stationary distribution. 

Now let and X'- j, be any two states of Xi'. The existence of a transition 
from X[ g^ to X'- ^ means that there is a sequence of a edge-switching attempts 
transforming Gi into Gj and updating w to b. Since every edge switch can 
be undone (as before, with the same probability with which it was previously 
done), there is also a sequence of a edge-switching attempts transforming Gj 
into Gi and updating w to b (i.e., from Xj to X'f,). If is the probability 
associated with the transition from X” to X'- in Xi", then clearly p"j = p"^ 
and we have the following consequence of Corollary 0 

Corollary 5. X4" converges to the uniform distribution regardless of the initial 
state. 

Two heuristics for adjusting w based on the outcome of the connectivity 
test have been proposed. In the first heuristic, which is a variation of the one 
introduced by Gkantsidis, Mihail, and Zegura in m and is henceforth referred 
to as the GMZ heuristic, w is updated to u> -I- 1 when the connectivity test 
is successful and to \w/2\ otherwise.^ The other heuristic, due to Viger and 
Latapy m and henceforth referred to as the VL heuristic, is based on two 
parameters, q'^ and q~, such that 9 + > 0 and 0 < 9 “ < 1. It prescribes that w 
be updated to (1 -I- q~^)w when the connectivity test succeeds and to (1 — q~)w 
otherwise. In (221 it is suggested that these two parameters be adjusted in such 
a way as to satisfy q'^/q~ = e — 1. We report on experiments with these two 
heuristics in Section^ 

3.2 A new heuristic 

Let a be such that 0 < a < 1. We introduce a new heuristic to adjust w whose 
goal is to achieve a constant probability a for the success of the next connectivity 
test. The new heuristic relies on a special connectivity test, whose details are 
described in Appendix 0 that not only checks whether G is connected but also 
calculates the probability that G remains connected after an edge-switching 
attempt. We refer to this new heuristic as the SB heuristic. 

^The original heuristic in uni differs from this variation in two ways. First, it forces the 
probability of remaining at the same state after a transition to be at least 0.5; secondly, 
the choice of the two edges to undergo a switch is restricted to nonadjacent edge pairs only. 
However, by adopting our variation of the heuristic, which lets adjacent edge pairs be chosen 
as well, the probability of remaining at the same state is automatically reinforced. 
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For 1 < i < \Qd\i let pi be the probability that Gi remains connected 
after an edge-switching attempt. The SB heuristic is based on the assumption 
that the probability that a connectivity test succeeds after w consecutive edge¬ 
switching attempts starting at Gi is ■ In other words, we assume that the 
probability that a graph remains connected after each of the w edge-switching 
attempts is pi, and also that it suffices that one single edge switch yields an 
unconnected graph in order for the next connectivity test to be unsuccessful. 
We note that the latter assumption makes special sense under power-law node¬ 
degree distributions, since in such cases random node deletions are not likely to 
split the graph into more than one relatively large connected component ncni 
What this means is that, when an edge switch renders the graph unconnected, 
the forthcoming connectivity test can only succeed if a subsequent edge switch 
is performed on edges from different connected components, that is, most likely 
on at least one edge belonging to a relatively small connected component, which 
is a low-probability event. 

In order to obtain p^, we calculate the number of pairs of edges of Gi on 
which performing an edge switch generates an unconnected graph. Let (uj,Uk) 
and {ux, Uy) be two edges of Gi. We say that {uj,Uk) and {ux,Uy) are neighbors 
if at least one other edge joins two of the four nodes in Gi. Clearly, an edge 
switch can only make Gi unconnected if the two edges involved in the switch 
constitute a cut of Gi. In addition, it is also necessary that the edge switch be 
performed on two edges that are not neighbors. Given two nonadjacent edges 
{uj,Uk) and {ux,Uy) that constitute a cut of Gi and moreover are not neighbors, 
only one of the two possible edge switches generates an unconnected graph. This 
is illustrated in Figured in part (a), each edge is, individually, a cut of the 
graph, constituting what we call a nonadjacent, non-neighbor bridge pair; in 
part (b), only together are the two edges a cut of the graph, constituting what 
we call a nonadjacent, non-neighbor pair cut. 

Clearly, there are (™) = m{m— l)/2 pairs of distinct edges, and on each one 
we may perform up to two edge switches, depending on how many are feasible. 
Let p,^ be the ratio of the number of nonadjacent, non-neighbor bridge pairs in 
Gi to m{m—l). Note that gives the probability that we choose a nonadjacent, 
non-neighbor bridge pair and perform on it the edge switch that produces an 
unconnected graph. Likewise, let /if be the ratio of the number of nonadjacent, 
non-neighbor pair cuts in Gi to m{m — 1). Then pf is the probability that we 
choose a nonadjacent, non-neighbor pair cut and perform on it the edge switch 
that produces an unconnected graph. We clearly have 

p^ = l- p° - p1. (3) 

In Appendix^ we give a connectivity test that calculates the value of pi and is 
asymptotically no harder than depth-first search in the worst case. 

If Gi is the graph obtained right after a connectivity test, then the intuition 
behind the SB heuristic indicates that w should be adjusted in a way that led 
to a = (pi)™, yielding 

In a 
In Pi' 
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Figure 2: The two possibilities for an edge switch to produce an unconnected 
graph. In part (a), {uj,Uk) and (ux,Uy) constitute a nonadjacent, non-neighbor 
bridge pair. In part (b), {uj,Uk) and {ux,Uy) constitute a nonadjacent, non¬ 
neighbor pair cut. The dashed lines delimit the connected components that 
appear when the edges crossing them are removed from the graph. 


Notice, however, that each graph Gi of Qd may have a different pi, so the 
Markov chain modeling this method might converge to a stationary distribution 
that is different from the uniform distribution. For this reason, we define p{t) to 
be the average of every pi obtained right after each of the first t-\-l connectivity 
tests (the initial one and the t others that correspond to transitions). We then 
let the SB heuristic adjust w according to 


w = 


In a 

Inp(t) 


(5) 


right after the tth connectivity test. Note, in connection with ©, that w is as¬ 
suredly a positive integer. Furthermore, for the reasons discussed in Section fd.il 
we limit w by a fixed upper bound W. 

We remark, finally, that as a consequence of w being adjusted as a function 
of every pi ever obtained, the method cannot be modeled as a Markov chain and, 
to be rigorous, can no longer even be treated as a variation of the ESMC method 
in which another heuristic is used. However, if p(t) converges as t —> oo, then w 
also converges. In this case, w approaches a constant and, as noted in Section|21 
we once again have a method that can be modeled as a Markov chain having 
a uniform stationary distribution. In Section© our approach to assessing the 
convergence of p(t) (and of w, consequently) is to compare the average value of 
g{t) at the end of an execution under the SB heuristic to those obtained under 
the GMZ and VL heuristics. As we demonstrate in that section, the figures for 
the SB heuristic vary within relatively small percentages with respect to those 
of either of the other two heuristics and we take this as indication that p{t) 
is close to convergence. In what follows, then, we continue to refer to the SB 
heuristic as an alternative for use with the ESMC method. 
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4 Experimental results 

In this section we present experimental results for the three heuristics of Sec¬ 
tion 0 We have concentrated on power laws with r = 2.0, 2.1,..., 3.0 and set 
n = 10^. All experiments were carried out on a Pentium 4HT running at 3GHz 
with 1GB of main memory. All running times we report refer to total elapsed 
times under a Linux operating system hosting one single user. 

Before discussing our experiments, we pause momentarily to elaborate on a 
curious behavior of the power-law distribution. From Section^ we know that, 
in order for D to be realizable, its average node degree must be no less than the 
average node degree of a tree, which is approximately 2 for sufficiently large n. 
For n = 10^, this is expected to hold only for r ^ 2.47, meaning that for r ^ 2.47 
D is expected not to be realizable. By requiring realizability as we repeatedly 
sample D from the power law, we are in fact making the node-degree distribution 
be slightly different from that very power law. What we have observed is that, 
for T ^ 2.47, the node-degree variance for realizable degree sequences tends to 
increase with r while the number of edges remains roughly constant. These 
characteristics have affected the results we present next very strongly. 

In our experiments, we used W = 10"^. For each value of t, we sampled 
600 realizable degree sequences and, for each of them, executed the generation 
method using the three heuristics and two distinct halting conditions. We car¬ 
ried out the VL heuristic for < 7 + = 0.1, 0.2, 0.3 and set q~ in such a way that 

/q~ = e — 1. The SB heuristic was carried out for a = 0.1,0.2,0.3. 

We have focused on analyzing four indicators, each calculated from the 600 
executions with each heuristic and each halting condition. The first one, which 
we denote by i?convj is the ratio of the average g{t) value at the end of an 
execution to the average value of g(t) also at the end of an execution. Rconv 
can be used as a source of information on the convergence of the Markov chain, 
as we know that g{t) is an unbiased estimator for g{t). Generally, the deviation 
of i?conv from 1 grows with how far the generated graph is from a uniformly 
random sample of Qd- The second indicator, which we denote by i?switch, is 
the average number of edge switches performed during an execution that are 
not undone as a result of the connectivity test. The third indicator, which we 
denote by Rw, is the average value of w at the end of an execution. The last 
indicator, finally, is the average running time (in minutes) of an execution and 
is denoted by 

4.1 Halting on the clustering coefficient 

For the first halting condition, we have let g{t) be the clustering coefficient of 
G. This coefficient is the ratio of three times the number of triangles in G to 
the number of three-edge paths in G (each triangle corresponds to three such 
paths) [21]. Galculating the clustering coefficient requires 0{dim) time, as a 
triangle is identified by checking whether an edge’s end nodes have a common 
neighbor. We have used 5 = 60 and 7 = 10“^ for this halting condition. 
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With regard to our discussion at the end of Section rOl on the convergence 
of p{t), we have observed the average value of g{t) under the SB heuristic to 
vary within only roughly 5% of the values obtained for the GMZ heuristic for 
most values of r, the exceptions being r = 2.4 (7.8%) and r = 2.5 (6.3%). As 
for the VL heuristic, the percentage drops to roughly 3%, the exceptions being 
the same with 5.7% for r = 2.4 and 5% for r = 2.5. 

Figure 13 shows the results obtained with this halting condition for the GMZ 
heuristic (parts (a-d)), the VL heuristic (e-h), and the SB heuristic (i-1). The 
plots for i?conv (Figure OKa, e, i)) show that i?conv is close to 1 for all the 
three heuristics, especially when r < 2.1 or t > 2.8. The plots for i?switch 
(FigureOKb, f, j)) show that the smallest value of i?switch is obtained for r w 2.4, 
suggesting that the clustering coefficient converges faster for such a value of t. 
The parameters g'*" and q~ of the VL heuristic and a of the SB heuristic seem, 
curiously, to have small impact on i?switch- Furthermore, since m is almost 
constant for r > 2.5, i?switch does not seem to be proportional to m, as assumed 
in the analysis conducted in [221 for a slightly different power law. The plots 
for Rw (Figure |2Kc, g, k)) show that the smallest value of is also obtained 
when T K, 2.4, indicating that the probability that an edge-switching attempt 
results in an unconnected G is smaller when r « 2.4. We note that the highest 
Ryj is obtained with the SB heuristic. The reason for this behavior seems to 
be that both the GMZ heuristic and the VL heuristic start with w = 1, while 
the SB heuristic starts with w relatively close to Rw The plots for i?time 
(Figure OKd, h, 1)) show that the SB heuristic yields on average the smallest 
running time, despite employing a more complex connectivity test. For example, 
the SB heuristic has on average outperformed the GMZ heuristic by roughly 
12% when r = 2.0, 44% when t = 2.3, 61% when t = 2.6, and 74% when 
r = 3.0. In comparison to the VL heuristic, these figures have been roughly 
21% when r = 2.0, 25% when r = 2.3, 51% when r = 2.6, and 56% when 
T = 3.0. Regarding the value of a, the smallest average i?time for the SB 
heuristic corresponds to a = 0.1. We expect i?time to decrease even more if we 
continue decreasing a, but this decrease will probably be progressively smaller 
until an optimal value of a is achieved. Also, it is curious to note that, for r 
near 3.0, the GMZ heuristic yields the smallest i?switch but the highest i?time 
in comparison to the other heuristics. In this situation, Ry^ is so small that, 
even performing substantially less edge switches, the ESMG method requires on 
average much longer to conclude. 

Figure ^a) presents the average at the end of an execution for the SB 
heuristic when the halting condition is based on the clustering coefficient. The 
value of T for which we obtain the highest average is 2.4, in accordance with 
the fact that Ry, is on average minimum for this same value (cf. Figure |2Ic, g, 
k)). When r is decreased from 2.4, on average /r)’ decreases as well, since the 
graph is expected to have more edges and, consequently, less bridges. When r 
is increased from 2.4, on average also decreases. The reason in this case is 
that, since the number of edges remains practically constant as r is increased 
from 2.4, and moreover the variance within the degree sequence increases, the 
graph tends to acquire several star-like subgraphs and therefore the fraction of 
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Figure 3: Experimental results for the GMZ heuristic (a-d), the VL heuristic 
(c-h), and the SB heuristic (i-1) when the halting condition is based on the 
clustering coefRcient. Plots refer to i?conv (a, e, i), i?switch (b, f, j), Rw (c, g, k), 
and .Rtime (d, h, 1). 
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Figure 4: Plots for (a), (b), and i?time when pair cuts are ignored (c). 

The halting condition is the one based on the clustering coefficient. 


adjacent or neighbor bridge pairs is expected to increase. Figure ^b) refers to 
/rf. The behavior is similar, albeit in an extremely smaller scale, thus indicating 
that the fraction of nonadjacent, non-neighbor pair cuts in graphs whose node 
degrees are power-law-distributed is on average negligible. If we ignore pair 
cuts and use pi = 1 — ^ m lieu of then we obtain figures for i?time as 
shown in Figure ^Jc). In this case i?time is on average significantly smaller than 
when pair cuts are not ignored (Figure|2Il)). This decrease is on average higher 
when di is expected to be smaller. Since for small di the time complexity of 
calculating the clustering coefficient is relatively close to the time complexity 
of a connectivity test, speeding-up the connectivity test impacts more strongly 
the overall running time. For example, when r = 2.4, in which case we have 
observed the value of di to be relatively small on average, ignoring pair cuts 
leads to a decrease in i?time of about 31% on average. Likewise, when r = 2.0, 
in which case we have observed the opposite trend regarding the value of di, 
the decrease in i?time is of about 7%. 

4.2 Halting on the average distance between nodes 

The second halting condition is based on letting g{t) be the average distance 
between the nodes of G, which can be calculated by conducting a breath-first 
search rooted at each node of G. This calculation requires 0(nm) time, therefore 
more than the calculation of the clustering coefficient. We have used d = 30 
and 7 = 10“^ for this halting condition. 

As we once again return to the issue raised at the end of Section 13.21 on 
the convergence of p(t), for this second halting condition we have observed the 
average value of g{t) under the SB heuristic to stay below roughly 1% of the 
values obtained for the GMZ heuristic for all values of r. As for the VL heuristic, 
the percentage remains the same but for r = 2.1 (1.8%). 

Figure El shows the results when this is the halting condition for the GMZ 
heuristic (parts (a-d)), the VL heuristic (e-h), and the SB heuristic (i-1). The 
plots for i?conv (Figure EKa, e, i)) show that i?conv is relatively far from 1 in 
comparison to the results obtained with the first halting condition (Figure Efa, 
e, i)). In order to obtain i?conv closer to 1, we may need to increase S and/or 
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decrease 7 . Despite being not so close to 1, the value of -Rconv is almost the 
same regardless of which heuristic is used to adjust w. The plots for i?switch 
(Figure |3b, f, j)) show that the smallest value of i?switch occurs when r « 2.4. 
Similarly to the case of the clustering coefficient, this suggests that the average 
distance between nodes converges faster when r Ri 2.4.^ The plots for 
(Figure |5fc, g, k)) also show that the smallest is obtained for r ss 2.4. 
Regarding i?time (Figure E^d, h, 1)), the plots show that the SB heuristic leads 
once again to the smallest running time on average. For example, on average 
the SB heuristic outperforms the GMZ heuristic by roughly 77% when r = 2.0, 
86 % when t = 2.3, 85% when r = 2.6, and 75% when r = 3.0. In comparison 
to the VL heuristic, on average the SB heuristic outperforms it by roughly 
41% when r = 2.0, 80% when t = 2.3, 82% when t = 2.6, and 54% when 
r = 3.0. The average gain obtained with the SB heuristic is higher under this 
halting condition, which can be explained by noting that each transition is now 
slower than under the halting condition based on the clustering coefficient. As a 
consequence, it is under the average-distance halting condition that the impact 
of adjusting w properly is more strongly manifest. Also, and unlike what occurs 
with the first halting condition, the gain obtained with the SB heuristic is now 
higher when r is around 2.4. This suggests that the choice for g{t) depends on 
a careful consideration of each application’s peculiarities. Regarding the value 
of a, the SB heuristic once again leads to the smallest i?time when a = 0.1, 
suggesting that the optimal value of ol is less than 0 . 1 . 

Figure El presents, respectively in parts (a) and (b), the average and /rf 
for the SB heuristic when the halting condition is based on the average distance 
between nodes. The results are similar to the ones shown in Figure EJa, b) for 
the halting condition based on the clustering coefficient. The plots for i?time 
(Figure EIc)), on the other hand, show a very different behavior. For almost all 
values of r, i?time is now seen to increase slightly when pair cuts are ignored. 
The reason for this behavior seems to be an insufficient number of samples. 
In fact, we expect i?time to be very slightly smaller than that obtained when 
pair cuts are not ignored. Since the time complexity of calculating the average 
distance between nodes is significantly higher than that of a connectivity test, 
ignoring pair cuts is therefore expected to have a small impact on the overall 
running time of the method. 


5 Conclusions 

We have considered the problem of generating, uniformly at random, connected 
graphs that have a given degree sequence but no multiple edges or self-loops. We 
studied the ESMC method, which employs edge switches to transform a graph 
into another while preserving the degree sequence. This method consists of first 
deterministically finding a graph with the desired properties and then perform- 

■^This agreement of the three heuristics under either halting condition may in fact be 
indicative that the ESMC method itself converges faster for this value of t. 
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Figure 5: Experimental results for the GMZ heuristic (a-d), the VL heuristic 
(c-h), and the SB heuristic (i-1) when the halting condition is based on the 
average distance between nodes. Plots refer to i?conv (a, e, i), i?switch (b, f, j), 
Rw (c, g, k), and .Rtime (d, h, 1). 
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Figure 6: Plots for (a), (b), and i?time when pair cuts are ignored (c). 

The halting condition is the one based on the average distance between nodes. 


ing random edge switches and also connectivity tests to obtain a randomized 
result. 

We showed that, if we attempt to perform a constant number w of edge 
switches between successive connectivity tests, then the method can be mod¬ 
eled as a Markov chain having a uniform stationary distribution. We also showed 
that, if w is not constant but rather is adjusted as a function of the last connec¬ 
tivity test’s outcome, then the method can still be modeled as a Markov chain 
of uniform stationary distribution. 

We have also introduced a new heuristic for adjusting w that depends on 
the probability that the graph being generated remains connected after an edge 
switch is attempted. In order to calculate this probability, we use a new connec¬ 
tivity test that has the same time and space complexities as depth-hrst search 
(cf. Appendix^. Even though the resulting method cannot always be modeled 
as a Markov chain, we showed that there are circumstances under which it too 
converges to the uniform distribution. 

One of the main issues regarding generation methods based on Markov chains 
is determining the number of transitions to be performed until the Markov 
chain is satisfactorily close to its stationary distribution. We have approached 
this issue by resorting to the pragmatic procedure of computing, after each 
transition, a certain function of the graph being generated, and halting the 
generation when the average of this function over all transitions seems to have 
converged. A proper choice for this function is essential to the efficacy of the 
method, but appears to require consideration on a case-by-case basis. 

We have given experimental results for power-law-based degree sequences. 
Our results contemplate two previous heuristics for adjusting w and also our 
new heuristic, and were given for two distinct halting criteria. They show that 
our heuristic, on average, outperforms the two existing heuristics on power laws 
for which 2 < r < 3. 

The ESMC method can be especially useful to generate a group of connected 
random graphs having the same degree sequence. After obtaining the first 
graph, we can continue performing a relatively small number of transitions to 
generate each additional instance, without having to run the method from its 
beginning. Finally, the ESMC method can be extended to generate random 
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graphs having a given degree sequence and another desired property (e.g., graphs 
having the clustering coefficient limited to a given interval). We need only find 
a means of obtaining an initial graph having that property, then obtain an 
efficient procedure to test whether a graph has that property, and also show 
the irreducibility of the Markov chain, that is, show that there is a sequence 
of edge-switching attempts connecting any two graphs having the given degree 
sequence and the desired property. 
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A The new connectivity test 

Given a graph Gi of Qd, we show how a modified depth-first search on Gi can 
be used to obtain pi in addition to testing whether Gi is connected. Let Si be 
the directed graph induced by a depth-first search on Gi. This graph contains 
the same nodes as Gi and a directed edge for each edge of Gi. The direction 
of an edge in Si is the direction along which the search traverses the edge for 
the first time. Let (uj uu) be an edge of Si. We say that Uj is the parent of 
Uk (or, equivalently, Uk is a child of Uj) if the search visits Uk for the first time 
from Uj. Edge {uj Uk) is then called a tree edge, as it is part of a directed 
spanning tree rooted at the start node of the search. If the search does not 
visit Uk for the first time from Uj, then {uj —>■ Uk) is called a back edge, as it 
necessarily represents a move toward an already visited node. Figure [T] shows 
an example Si ; nodes are numbered in such a way that an edge is a tree edge if 
and only if it leads from a lower-numbered node to a higher-numbered one. 

The level of a node uj in Si is the length of the shortest directed path from 
the root to Uj. The descent and ancestry of Uj in Si are, respectively, the set 
of nodes toward which a tree path exists from Uj and the set of nodes from 
which a tree path exists toward Uj. Node uj is excluded from either set. Let 
{uj —> Uk) be a tree edge and {ux Uy) a back edge. We say that (ux —> Uy) 
covers [uj —>■ Uk) A Ux = Uk or Ux belongs to the descent of Uk, and furthermore 
Uy = Uj or Uy belongs to the ancestry of Uj. In Figure [TJ edge {un —> U2) covers 
edges (u2 —> U 3 ) and (ua — > U4). 

Let us proceed to the calculation of pi, which by © depends on p}f and 
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Figure 7: An example Si. Nodes are visited in the order ui ,..., M 2 o- 


A.l Handling bridge pairs 

Clearly, the number of nonadjacent, non-neighbor bridge pairs of Gi, on which 

is based, can be obtained from the number of bridges of Gi, the number 
of pairs of adjacent bridges of Gi, and the number of neighbor bridge pairs of 
Gi- During the search, we count some undirected paths in Si (i.e., paths whose 
edges’ directions are ignored) having certain special properties. For each node 
Uj, we use the counters B^, B^^, B^^^, B’^^, and to record how many 

undirected paths of Si start at uj, proceed through nodes in the descent of Uj 
exclusively, and moreover consist in Gi of, respectively, one bridge, two bridges, 
three bridges, a non-bridge edge followed by a bridge, and two bridges separated 
by a non-bridge edge. We now explain how these counters can be used to obtain 
the number of nonadjacent, non-neighbor bridge pairs of Gi and also how we 
can calculate them during the search. 

The number of bridges of Gi can be easily obtained during the search, as an 
edge of Gi is a bridge if and only if it is a tree edge of Si that is not covered by any 
back edge (e.g., (iti —> us) in Figure!?!). What we do is simply to accumulate 
B^ into a global counter as the exploration of uj concludes. Obtaining the 
number of pairs of adjacent bridges of Gi is also simple, since it is a matter 
of accumulating, as the exploration of uj concludes, the number of pairs of 
adjacent bridges that are incident to Uj and its descendants, that is, 

(f)+Bf. (6) 

As for obtaining the number of neighbor bridge pairs, note first that the 
edge connecting the two bridges can be of three types. It can be another bridge 
(e.g., {ui —> Us) connecting {ui rtn) to {us —>■ ug), and (rtg —> ug) connecting 
(ui Us) to (ug —> uio) in FigureEJ; it can be a tree edge that is not a bridge 
(e.g., (iti ^ U 17 ) connecting (ui un) to (un uis), and (mh ^ U 12 ) 
connecting (ui rtn) to (ui 2 M 15 ) in Figure!?!); and, finally, it can be a back 
edge (e.g., (uig —> ui) connecting (mi ^ un) to (uig —> U20), and (uig ^ un) 
connecting (ui un) to (rtig ^ uu) in Figured). Let then (uj —> Uk) be a 
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tree edge. As the exploration of Uk concludes, for each Ux in Uk’s descent from 
which a back edge exists toward Uk, we add to We then accumulate 

Bf^{Bl-l)+Bl'^'° + BlBl'^ + Bl’^'^ (7) 

into the global counter of neighbor bridge pair of Gi. When at last the search 
returns to Uk’s parent Uj, we do one of the following: if {uj —> Ufc) is a bridge, 
then we increment B'° and add B^ to B^^, B^^ to B'°^^, and B^^ to B^^^; 
otherwise, we add to Bj'^. 

A.2 Handling pair cuts 

The number of nonadjacent, non-neighbor pair cuts, which is the basis for com¬ 
puting fj,^, can be obtained by calculating the number of pair cuts, the number 
of adjacent pair cuts, and the number of neighbor pair cuts. Two edges of Si 
form a pair cut if and only if they are covered by one single common back edge 
(e.g., (ui —> U2) and (u4 ^ U5) in Figure [ 7 |). In order to identify pair cuts 
during the search, for each node we store the back edge that connects either the 
node itself or one of its descendants to its lowest-level ancestor. If more than 
one back edge reaches the same node, then we need store neither, since no edge 
through which the search is yet to backtrack can be uniquely covered by any of 
them. 

Let {uj —> Uk) be a tree edge. Assume that {uj Uk) is covered only 
by the edge {ux Uy) and let C(^u^—>uy) be a counter of the number of edges 
covered only by {ux —> Uy). Clearly, the number of pair cuts either covered 
by {ux —> Uy) or including this edge is ( ^ since (ux Uy) also 

participates in a pair cut along with each of the edges that it covers. 

This number is accumulated into a global counter of the pair cuts of Gi as the 
search detects that no edge through which it is yet to backtrack is covered only 

by (ux Uy). 

In order to identify adjacent and neighbor pair cuts, we need to keep some 
information regarding {ux —> Uy) and the edges covered only by it as the search 
backtracks from Uk- Besides {ux —> Uy) itself and we also need to 

retain information on three other nodes, which we denote by vi, V 2 , and U3. 
Nodes vi and V2 are the two lowest-level nodes such that the edge between each 
of them and its parent is covered only by {ux Uy). Node V3 is the highest-level 
node such that the edge between it and its parent is covered only by {ux Uy). 
For example, as the search backtracks from its in the case of Figure [7| we store 
the back edge (ity —> ui) and let vi = its, V 2 = uq, and 1)3 = u^. 

Assume now that the search has concluded the exploration of all the neigh¬ 
bors of Uj. In the case of a child Uk of Uj, assume as above that {uj —> itfe) is 
covered only by the back edge (ux Uy). Adjacent pair cuts can be identified 
in three scenarios: when Uk = Ux (Figure |H|[a)), when Uj = Uy (Figure |HIb)), 
and when Uk is the parent of vi (Figure|HIc)). As for neighbor pair cuts, there 
are five cases. The first case happens when a tree edge connects (ux —>■ Uy) 
to one of the tree edges covered only by it; this can be identified either when 
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Figure 8: Scenarios for the occurrence of adjacent and neighbor pair cuts. 


Ux is a child of Uk (Figure |HId)) or when Uy is the parent of Uj (Figure |HKe)). 
The second case occurs when another back edge connects (ux Uy) to one of 
the tree edges covered only by it; this can be identified either by the existence 
of the back edge {ux Uk) (Figure |Htf)) or by the existence of the back edge 
(uj —> Uy) (Figure IHKg)). The third case occurs when {ux ^ Uy) connects two 
edges covered only by it, and can be identified when uj = Uy and Ux = V 3 
(Figure IHKh)). The fourth case occurs when a tree edge connects two other tree 
edges, the latter two covered only by {ux Uy); this case can be identified by 
vi or V2 being two levels above Uk (Figure|HKi)). The fifth and last case happens 
when another back edge connects two edges covered only by {ux Uy), which 
can be identified by the existence of a back edge from the parent of vi to Uk 
(Figure IHKJ)). After updating the number of adjacent pair cuts and neighbor 
pair cuts before the search backtracks from Uj, we increment ) and let 

V 2 = vi and vi = Uk- If no node is currently marked as V 3 , then we also let 
V3 = Uk- 


A.3 Complexity 

Let us now discuss the space and time complexities of this modified depth-first 
search. Clearly, the ESMC method requires Q{m) space, since we need to store 
an array with the edges of the graph being generated. During the search, for 
each node Uk, we need to store its parent (say, Uj), its level, the and the 

back edge covering (uj Uk) that reaches the lowest-level node. If (ux —> Uy) is 
this edge, then we also need to store the fi, V 2 , fa, and ) corresponding 

to {uj Uk)- Summing up over all nodes, this information requires only 0(n) 
space. Furthermore, for each node we keep a list of the back edges arriving at 
it for the sake of handling the cases in Figure|HKf, j), which requires 0{m) space 
overall. We also, finally, keep a global n-element array for nodes to register the 
back edges originating at them. This is needed for identifying the occurrence 
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of the scenario illustrated in Figure |HIg). We then see, in summary, that the 
modified depth-first search does not change the space complexity of the ESMC 
method. 

Obtaining the time complexity requires that we detail the steps performed 
during the exploration of a node Uj of Gi. First we explore each neighbor Uk 
of Uj, and update the B^’s if (uj ^ uu) is a tree edge. Otherwise, if (uj ^ uu) 
is a back edge, then we include it in the list of back edges arriving at Uk and 
record the back edge that leaves Uj and arrives at its lowest-level ancestor. After 
exploring the entire descent of Uj, for each node toward which there is a back 
edge leaving Uj we set a mark in the n-element array. Then, for each child 
Uk of Uj, we update the counters of pair cuts using the n-element array, the 
information regarding the back edge, say {ux —> Uy), that reaches the lowest- 
level node, and the list of back edges arriving at Uk- The edge (ux Uy) may 
become the back edge that arrives at Uj's lowest-level ancestor; in this case, we 
also update vi, V 2 , and vs, which requires 0(1) time. We then reset all 

marks in the n-element array,^ and for each back edge arriving at Uj we update 
Bj'°, which also requires only 0(1) time for each edge. Finally, we conclude 
the exploration of Uj by updating the counters of adjacent and neighbor bridge 
pairs using and O- We then see that a tree edge {uj ^ Uk) is visited at 
most three times, twice when Uj and Uk are exploring their neighbors, and once 
more when Uj revisits the tree edges leaving it to update the pair-cut counters 
and the back edge reaching the lowest-level node. Each back edge {ux Uy), 
in turn, is visited at most six times, twice when Ux and Uy are exploring their 
neighbors, twice when Ux set and reset marks in the n-element array, once when 
updating By^, and once more when the parent of Uy is updating the number 
of neighbor pair cuts (cf. the cases illustrated in Figure |HKf, j)). In conclusion, 
the time complexity of the modified depth-first search is 0 {m), thus the same 
as that of the standard depth-first search. 


®Note that at this moment only neighbors of Uj may be marked. So this step can be 
performed without checking all n positions. 
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